fix(studio): filter runtime-generated nodes from resolver-shadow telemetry by vanceingalls · Pull Request #1795 · heygen-com/hyperframes

vanceingalls · 2026-06-30T07:06:48Z

What

Reduce noise in the studio:sdk_resolver_shadow telemetry tripwire and add metadata to triage what remains. Telemetry-only — no disk writes, no change to user-visible edits.

Why

The resolver-shadow tripwire (PR #1547) emits element_not_found whenever the SDK session can't resolve an hf-id the server patch path targets. PostHog (SDK Resolver Shadow Parity) showed the dominant live class is runtime-generated nodes — elements a composition <script> creates at runtime (caption word/group spans, etc.). These have no static data-hf-id, so the SDK session (a static parse) cannot model them by design. They are noise, not a resolver bug, and they were drowning the genuine v0.6.110-class divergences the tripwire exists to catch.

Investigation (last 14d of events) ruled out the alternatives:

Content mismatch / stale session — ruled out: comp == selFile in 39/40 top rows.
Already-fixed class-1 (inlined sub-comp bare-leaf, fix(studio): shadow resolves bare leaf via dispatch path, not getElement #1552/v0.6.112) — the historical top-firers (hf-0ytc etc.) are all pre-fix 0.6.111 noise.
The biggest live firer, captions.html .captions-container, traces to packages/studio/src/captions/generator.ts building word/group nodes via createElement at runtime.

Changes

Runtime-node filter (sdkResolverShadowCheck / runResolverShadow): suppress element_not_found when the resolved hf-id is absent from the on-disk source. An id present in source but missing from the session stays flagged — that's the real resolver divergence. The "is it in source" test is the independent signal; "is it in the session" is circular (that's what the tripwire measures). Scoped to the DOM-edit path, where every observed runtime-node divergence lives.
sessionElementCount on all element_not_found / animation_not_found emits: 0 = empty/broken session (actionable), >0 = element-specific.
sourceHfIdCount on emitted element_not_found: =1 = a single static node the parse dropped (foreign-content exclusion / sub-comp inlining gap), >1 = duplicate ids → resolver picked the wrong instance. Forks the one remaining unexplained bucket (#soe-comment-style template ids) directly from telemetry.

Test

packages/studio/src/utils/sdkResolverShadow.test.ts — added cases: absent-from-source → suppressed; present-in-source-but-missing → still flagged; duplicate-id → sourceHfIdCount: 2. 32 pass under vitest.

Not in scope

Root-cause fix for the template-id bucket — needs the sourceHfIdCount split from a day of telemetry first.
Element-targeted paths (setTiming / removeElement / reorder / addGsapTween) have no sync source access and are low-volume; left unguarded.

🤖 Generated with Claude Code

…metry The sdk_resolver_shadow tripwire flagged element_not_found for nodes a composition <script> creates at runtime (caption word/group spans, etc.). These have no static data-hf-id, so the SDK session (a static parse) cannot model them by design; the divergence is noise, not a resolver bug. - Runtime-node filter: suppress element_not_found when the resolved hf-id is absent from the on-disk source. An id PRESENT in source but missing from the session stays flagged (the genuine v0.6.110-class resolver divergence). - Add sessionElementCount to all element_not_found / animation_not_found emits (0 = empty/broken session, >0 = element-specific). - Add sourceHfIdCount to emitted element_not_found: =1 = static node the parse dropped (foreign-content exclusion / sub-comp gap), >1 = duplicate-id resolver ambiguity. Scoped to the DOM-edit path. Telemetry-only; no disk writes, no edit change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

vanceingalls · 2026-06-30T07:07:01Z

fix(studio): filter runtime-generated nodes from resolver-shadow telemetry #1795 👈 (View in Graphite)
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

miga-heygen

Solid, well-scoped PR. The runtime-node filter logic is sound, the telemetry metadata enrichments are useful for triaging the remaining buckets, and the test coverage hits the important cases. CI is green on all required checks. A few observations:

1. includes vs countHfIdInSource semantic gap (minor — metadata inconsistency, not a bug)

The filter gate at L180 uses sourceContent.includes(hfId) (raw substring), while countHfIdInSource uses the attribute-scoped data-hf-id="${id}". For an id like hf-a where source only contains data-hf-id="hf-abc", includes("hf-a") returns true (correctly keeps the signal — no false suppression), but countHfIdInSource("...", "hf-a") returns 0. The emitted event would carry sourceHfIdCount: 0 for an element_not_found that survived the filter — a "present in source but zero attribute matches" state that could confuse triage dashboards. The bias direction is correct (overshoot toward keeping signal), but making the filter gate use countHfIdInSource(sourceContent, hfId) > 0 instead of includes would keep both checks on the same semantic level and avoid the zero-count anomaly. Not blocking — the noise budget on this edge is tiny (hf-ids with shared prefixes are uncommon) and the "ponytail" comment documents the intent.

2. sessionElementCount double-call in runResolverShadow

sdkResolverShadowCheck already calls resolveSnapshot → session.getElements() internally, and then runResolverShadow calls session.getElements().length again for the telemetry payload. Since getElements() returns a fresh snapshot array each time, this is a second full traversal. For the telemetry-only hot path on every style/text/attr edit, the double call is worth noting. Not blocking since the check only reaches telemetry on divergence (early return on mismatches.length === 0), so the second call fires only on the already-rare emit path.

3. Test coverage

The three new C8 cases cover the filter logic well (suppressed / kept / count). One thing that would strengthen the suite: a test for the includes vs countHfIdInSource gap itself — e.g. id "hf-a" against source data-hf-id="hf-abc" — to document the expected behavior when the loose match hits but the tight count doesn't. Not blocking; flagging it as a future hardening opportunity.

4. Everything else looks clean

countHfIdInSource split technique is correct (no regex escaping needed for hf-ids).
sourceContent?: string optional parameter is backward compatible.
Existing redaction untouched — no PII risk from the new count fields.
The intentional exclusion of recordResolverParity / recordAnimationResolverParity from the runtime-node filter is well-documented in the PR body.
recordAnimationResolverParity caches session.getElements() to avoid the double-call — nice touch.

Overall: clean telemetry-only change with correct bias direction. LGTM with the minor notes above.

— Miga

miguel-heygen

Reviewed 10743d85.

Verdict: APPROVE.

The runtime-node filter is directionally correct and scoped to resolver-shadow telemetry only: when Studio resolves a live DOM hf-id that is absent from the on-disk source, the event is suppressed because the SDK static session cannot model script-created runtime nodes by design. Static ids that are present in source but missing from the SDK session still emit, preserving the actionable v0.6.110-class resolver-divergence signal.

Verified:

useDomEditSession now passes originalContent only into the shadow check path; edit persistence behavior is unchanged.
sdkResolverShadowCheck suppresses absent-from-source element_not_found cases and keeps present-in-source missing-session cases.
sourceHfIdCount is telemetry metadata only; source content itself is not emitted.
sessionElementCount is added to element/animation not-found telemetry, which should make empty-session vs element-specific buckets easier to triage.
Tests cover the important suppression/keep/count cases.

Non-blocking: I agree with Miga that the sourceContent.includes(hfId) gate and countHfIdInSource() use slightly different semantics. A shared countHfIdInSource(sourceContent, hfId) > 0 gate would avoid rare substring-prefix anomalies (hf-a vs hf-abc) where an event survives with sourceHfIdCount: 0. The current bias is safe because it keeps extra signal instead of suppressing a real bug.

CI has no failures at review time; Windows render/test jobs are still pending.
— Magi

james-russo-rames-d-jusso

Reviewed 10743d85 — layering on Miga's COMMENTED and Magi's APPROVED reviews from this morning. ✅ LGTM (telemetry-only, scoped, correct bias direction).

Summary

Adds a runtime-node filter (sourceContent.includes(hfId) gate) to suppress noisy element_not_found for script-created nodes, plus sessionElementCount on all emit sites and sourceHfIdCount on the DOM-edit path's element_not_found events. Filter live only on runResolverShadow (DOM-edit path) — the element-targeted cutover chokepoints (recordResolverParity, recordAnimationResolverParity) stay unguarded, which the PR body documents as deliberate (no sync source access; low-volume).

Findings

🟡 Dashboard-consumer impact of the `includes` vs `countHfIdInSource` gap (extending Miga's finding 1)

Miga's note about the semantic gap is mechanically correct (verified — "data-hf-id=\"hf-abc\"".includes("hf-a") is true; countHfIdInSource(src, "hf-a") returns 0). The downstream consequence is worth surfacing explicitly: the PR body and the comment at sdkResolverShadow.ts:277-281 document sourceHfIdCount as a {=1: single-static-parse-drop, >1: duplicate-id} two-bucket signal, but the gap introduces a third bucket — =0 — for ids that pass the loose includes gate as a longer-id prefix. Anyone querying PostHog with sourceHfIdCount == 1 to size the "parser dropped a static node" cohort will miss those events, and sourceHfIdCount > 0 won't catch them either.

Two cheap fixes, take either:

Tighten the gate to countHfIdInSource(sourceContent, hfId) > 0 (Miga's suggestion) — keeps the two-bucket invariant by construction.
Keep the loose gate, but document =0 as the third explicit bucket ("loose-prefix match — same disposition as runtime-only; suppress at dashboard layer") and add a regression test pinning the behavior. The "ponytail" comment already nods at this; just make the dashboard-side intent explicit.

I'd take (1) — the cost is one extra split-pair on the hot-path early return, and it preserves the documented mental model. But not blocking.

🟡 Test for the `includes`/count boundary (seconding Miga's #3)

A test like the one Miga sketched — id "hf-a" against source data-hf-id="hf-abc" — would pin the current behavior either way (whether you take fix #1 or fix #2 above). The C8 cases as-written cover the canonical paths (runtime-only / static / dup) but don't cover the prefix-overlap corner that's now load-bearing for the filter's correctness.

🟡 `runResolverShadow` failure-path coverage (per the observability rubric)

The new metadata is only emitted on the element_not_found mismatch branch (correct). But the existing dispatch_error branch at sdkResolverShadow.ts:214 returns a mismatch without the runtime-node filter being consulted (the filter only sits in the unresolved-snapshot early return at L177-187), so a dispatch_error on an actually-runtime-only hfId would still emit. Not a new gap — same behavior as before this PR — but the PR's stated goal is "reduce noise on the DOM-edit path." If captions.html-class runtime ids ever reach the dispatch path (i.e. resolveSnapshot succeeds via the bare-match fallback at L86-89), they'd skip the new filter. Worth a follow-up grep to confirm the bare-match fallback never resolves a runtime-only id to a real session element. If it does, that's a different noise bucket the filter doesn't catch.

✅ Verified clean

originalContent at useDomEditCommits.ts:153-157 is the full file body (read via /api/projects/.../files/<targetPath>), so the includes substring search has the right surface — it's not scoped to a selection subtree.
trackStudioEvent (studioTelemetry.ts:69-77) is synchronous queue-push; no import().then async-import race for early-call telemetry loss (one of the failure-modes I always check on telemetry PRs — clean here).
PostHog payload values are bounded: sessionElementCount is an int, sourceHfIdCount is an int, no user content. No cardinality / PII concern from the new fields. Existing redactMismatches redaction of expected/actual is untouched.
recordAnimationResolverParity cache of session.getElements() (L344) is a nice incidental cleanup; matches Miga's note.
The intentional exclusion of recordResolverParity / recordAnimationResolverParity from the source-filter is documented in the PR body ("low-volume; left unguarded") and the source-access constraint is real — sdkTimingPersist / sdkGsapTweenPersist don't read the file.

✅ Sibling-asymmetry check (clean)

The three trackStudioEvent("sdk_resolver_shadow", ...) emit sites at L272 / L312 / L347 are intentionally asymmetric on sourceHfIdCount (DOM-edit only) and intentionally symmetric on sessionElementCount (all three). Matches the PR-body intent. The redaction shape is identical across all three.

Verdict

LGTM. The includes vs count gap (Miga's #1) is the most actionable polish; my dashboard-bucket reframe makes the "fix vs document" tradeoff explicit. None of it blocks merge — the bias direction is correct (keep signal when in doubt), and the documented goal of cutting captions.html-class noise is preserved.

— Rames D Jusso

miga-heygen reviewed Jun 30, 2026

View reviewed changes

miguel-heygen approved these changes Jun 30, 2026

View reviewed changes

vanceingalls merged commit a4eacae into main Jun 30, 2026
54 of 60 checks passed

vanceingalls deleted the sdk-resolver-shadow-runtime-filter branch June 30, 2026 07:16

james-russo-rames-d-jusso reviewed Jun 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(studio): filter runtime-generated nodes from resolver-shadow telemetry#1795

fix(studio): filter runtime-generated nodes from resolver-shadow telemetry#1795
vanceingalls merged 1 commit into
mainfrom
sdk-resolver-shadow-runtime-filter

vanceingalls commented Jun 30, 2026 •

edited

Loading

Uh oh!

vanceingalls commented Jun 30, 2026

Uh oh!

miga-heygen left a comment

Uh oh!

miguel-heygen left a comment

Uh oh!

Uh oh!

james-russo-rames-d-jusso left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

vanceingalls commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

Changes

Test

Not in scope

Uh oh!

vanceingalls commented Jun 30, 2026

Uh oh!

miga-heygen left a comment

Choose a reason for hiding this comment

Uh oh!

miguel-heygen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

james-russo-rames-d-jusso left a comment

Choose a reason for hiding this comment

Summary

Findings

🟡 Dashboard-consumer impact of the includes vs countHfIdInSource gap (extending Miga's finding 1)

🟡 Test for the includes/count boundary (seconding Miga's #3)

🟡 runResolverShadow failure-path coverage (per the observability rubric)

✅ Verified clean

✅ Sibling-asymmetry check (clean)

Verdict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vanceingalls commented Jun 30, 2026 •

edited

Loading

🟡 Dashboard-consumer impact of the `includes` vs `countHfIdInSource` gap (extending Miga's finding 1)

🟡 Test for the `includes`/count boundary (seconding Miga's #3)

🟡 `runResolverShadow` failure-path coverage (per the observability rubric)